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> • 1 Introduction 



The models we consider are of the form 

y = f(X T p) + e, X G R nXp , (3 G M p , y, e G R™, (1.1) 



Q\ \ where / : ffi — > R is a known function, X a fixed design matrix, y and e are vectors 

of observations and errors, respectively. In (1.1) and henceforth, for x G R n , we 
denote f(x) = (f(xi), . . . , f(x n )) T . The parameter [3 is sparse in the sense that the 
^ \ number of its nonzero coordinates is much smaller than its dimension [9] . 

%— i ' For a > and v G MP, denote by \\v\\ a the £ a -norm of v. The support of v is 

defined to be spt(u) := {i : v% ^ 0}. Denote by \A\ the cardinality of a set A. The 
£o-norm of v is \\v\\o = \spt(v)\. By an ^-regularized estimator of (3 we mean 

f} = avgmm[£(y,Xv) + c r \\v\\ a ] , (1.2) 

i>GD 

where D C MP is a pre-selected search domain, £(y, Xv) a loss function, and c r > 
a tuning parameter. We are interested in the case where a = 1. 

For models (1.1), much has been learned about the case where p is fixed or much 
smaller than n (cf. [5; 6] and references therein). The note is concerned with the case 
where p can be large, possibly much larger than n and, at the same time, |spt(/3)| 
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is much smaller than p. Under this setting, the case where f(x) = x has been a 
subject of great interest recently (cf. [1; 2; 3; 8; 10; 11] and references therein). 

The main purpose of the note is to establish general results on the estimator 
(1.2) similar to Proposition 2.1 in [4]. Once established, the results allow the steps 
in [4] to be followed, often word by word, to get error bounds for specific cases. In 
(1.2), while the function being maximized only involves ||u||i, the search domain D 
may be constrained in terms of ||f||o as well as certain weighted ^i-norm of v . As a 
result, we get two types of estimators, one being regularized by ||t>||o and (weighted) 
^i-norms of v , the other only by ^i-norms of v. Error bounds for both types of 
estimators will be derived. The former type of estimators can attain the same order 
of precision as their ^-regularized counterparts studied in [4]. In contrast, although 
the latter type of estimators are computationally more amenable, in some cases 
they seem unable to attain the same order of precision, at least with the techniques 
employed here. 

To reduce repetition, we will omit most of results that can be established directly 
following [4] and instead focus on those that require new ideas. 

2 Main results 

The row vectors and column vectors of X will be denoted by Xj, . . . , Xj and 
Vi,...,V p , respectively. We shall assume that Vj ^ 0. For g = (<?i,..., g n ) and 
x £ M n , where each g$ : R — > R is a function, denote g{x) = {gi{x\), . . . , g n (x n )) T . 

As in [4], to bound the error of the ^-regularized estimator in (1.2), our first 
step is to show that (3 belongs to a set of v that satisfy the following inequality, 

G(if>(Xv) - VW)) < 2|(e, <p{Xv) - <p(X0))\ - c^v^ - [|/?[|i) 5 (2.1) 

where G : W 1 — > K is a function, ip = (ipi, . . . , ipn), ip = (<p\, . . . , (f n ), with ipi and 
ipi being functions from R to R. In many cases, it is not very hard to get (2.1) for 
maximum likelihood estimators (MLE) or least square estimators (LSE). We will 
illustrate this later. Our focus next is to use (2.1) to derive two error bounds for (3. 

2.1 Conditions and general error bounds 

For both error bounds, we need the following condition. 

Condition HI Given q € (0, 1), there is c\ = C\{X , (3 , ip , q) > 0, such that 

Pr{|(e, <p(Xv) - (p(X0))\ < cxV^\\v - 0\\i, all v £ D) > 1 - c q, 

where Co > is an arbitrarily pre-selected constant, such as 1 or 2. 

The same condition was used in [ I], but with cq = 2. As remarked in [4], cq is 
purely for notational ease when Condition HI is verified for specific cases. To get 
the first bound, we also need another condition used in [4]. 
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Condition H2 There is C2 = C2(X,(3,ifi) > 0, such that for all v G D, 

G(ip(Xv) - ^(X(3)) > c 2 n\\v - 0\\ 2 2 . 

We now can state the first error bound for (5. 

Proposition 2.1 Suppose Conditions HI and H2 are satisfied. If (3 G D is a 
random variable that always satisfies the inequality (2.1) with c r = 2c\y/n, then, 
letting K r = 4ci/c 2) Pr{||/3 - /3|| 2 < K r v / |spt(/?)|/n} > 1 — c <?. 

To get the second bound, we replace Condition H2 with the next one. 

Condition H3 There is C3 = cs(X, f3, ifi) > 0, such that for all z G {Xv : v G D}, 

G(i/>(z)-i/>(X0))>cs\\z-Xl3\\l 

We also need some conditions on the second moments of the column vectors of 
X. Such conditions are sometimes referred to as coherence property [1; 2]. Let 

IT/ T T/-I WV-W 2 IIV-II 2 

\ v i v ]\ ■ \\ v t 2 u \\ y i\\2 

UX = max , ax = mm , bx = max . 

i<*<i<p || Vi\\2\\ Vj\\2 i<*<p n i<«<p n 

Proposition 2.2 Suppose Conditions HI and H3 are satisfied and ax + bx^x > 
6bx\spt((3)\fj,x ■ Fix r > such that 

ax + bxlix > 26x(3 + 4r)|spt(/3)| / u x , (2.2) 
and let c r = 2(1 + l/r)ciy / n, 

= 3(2 + 1/t) v / 2 + (1 + 2t) 2 ci 
ax + bx^x c 3 ' 

If P £ D is a random variable that always satisfies the inequality (2.1) with the 
above c r as the tuning parameter, then Pr{||/3 — [3\\2 < K r y / |spt(/3)|/n} > 1 — cgq. 

Since ax < (2.2) sets an upper bound on \xx- To get a moderate value 
of K r in Proposition 2.2, r has to be moderate. If, say, r = 1, then by (2.2), 
o-x/bx > (14| spt(/3) I — l)nx, which further limits the magnitude of fix- Under 
certain conditions, one can get [ix = 0(y n v lnp) [2; 4], which is small for large 
n, even when p is much larger than n, for example, p = n a with some a > 1. 

We next make some comments on conditions used in specific cases to establish 
Conditions HI - H3. To establish Condition HI, the following tail condition on the 
errors Ci is useful: there are a > and c 6 > 1, such that 

Pr{|a T e| > t\\a\\ 2 } < c t e~ t2 1 {2 ° 2) , all t > 0, a G M n . (2.3) 

As remarked in [4], typically c e can be set at 2. At the end of the note, we will see 
that in some cases c e has to be set at other values. 
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To establish Condition H2 or H3, we usually need to put some restrictions on the 
search domain D in (1.2). To establish Condition H3, which is the less restrictive 
of the two, we typically choose 

D C D(J) = T~ 1 (/ n ) = {d£1 p : Xjv £ I, 1 < i < n}, (2.4) 

where T is the mapping v — > Xv and I is an interval in R. In general, we need 
not put restrictions on |spt(f)|. On the other hand, to establish Condition H2, we 
typically start with verifying Condition H3, and then proceed to get \\X(v — f3)\\2 > 
c\\v — /5 1 1 2 for some constant c > 0. To do this, we need to put restrictions on 
|spt(v)|, typically by requiring 

DCV(I,h) = D(J) n {u £ R p : |spt(u)| < h} , 

with h > 1 being bounded in terms of fix (cf. [4]). Thus, though not directly used 
in Proposition 2.1, coherence property of X is needed in specific applications of the 
Proposition. 

2.2 Proofs 

For v £ MP and S C {1, . . . ,p}, denote vs = (xi, ■ ■ ■ ,x p ) T with X\ = V{1 {i £ S}. 
Let d = v — j3. Then for any S D spt(/3), we have v = (3 + ds + v$c and 

Hi = \\P + ds\\i + \\v S c\\i, \\d\\ a a = \\d s \\ a a + \\v s 4a, for any a > 0. (2.5) 

Proof of Proposition 2.1. Let d = (3 — (3. Because (3 always satisfies (2.1), by 
Conditions HI and H2, with probability at least 1 — coq, 

cM\d\\l< 2c lv ^Hi -cvMi-Pli) 
= 2dv^(Ni + lli9||i-||i9||i)- 
Let S = spt(/3). Apply (2.5) to the right hand side of the above inequality to get 

C2n||d||! < 2^(11^11! + + WPWi ~ \\P + ds\\i - \\0s4i) 

= 2c l V^(\\d s \\i + \\P\\i-\\P + d s \\ 1 ). 

Then by Minkowski inequality and Cauchy-Schwartz inequality, 

\\dg < 4(d/c2)||ds||l/v^ < KrVW^Wdsh- 

Because ||d||2 = \\dsW2 + Il/55 c ll2 by (2-5), the above inequalities imply 

Mill < M := sup{x 2 + y 2 : x > and y > satisfy x 2 + y 2 < k t \J\S\jnx\. 

To find M, first, in order that x 2 + y 2 < K r y / |S , |/nx, there must be K 2 \S\/n > 
Ay 2 . Given y > satisfying the condition, the maximum possible x is 

x (y) = (l/2)Kvi%+ ^ K 2|5|/n-4y2]. 
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It is seen that 

2 2 4\S\/n + Kr ^\S\J^^/^\S\/n-4y^ - 
x (y)+y = — 2 <K P |5|/n. 

Therefore, M = K 2 \S\/n, where the maximum is obtained if and only if x = 
K r ^/\S\/n and y = 0. This yields ||d||2 < vM = «v -y/|5|/n, as desired. □ 

Proof of Proposition 2.2. It suffices to show that 

Pr{||w - /3|| 2 < KrV|spt(j9)|/n for all v G D satisfying (2.1)} > 1 - c <?. (2.6) 

By Conditions HI and H3, with probability at least 1 — c^q, the inequality 



c 3 \\X(v - P)\\i < 2c ly fti\\v - P\U - 2ci(l + l/r)v^(IM|i 



holds for all v G D satisfying (2.1). Fix one such v and an arbitrary S D spt(/3). 
Let d = v- P. By (2.5), 

c 3 \\Xd\\l < 2c lv ^(Ns-||i + ll^lli) 

- 2ci(l + l/r)v^(||/9 + dslli + IMIi - ||/3||) 
= 2ciV^||d5||i " 2ci(l + 1/t)v^(||/9 + dslll - ||/3||) - 2(c 1 /t) v / ^|M| 1 . 

For ease of notation, denote c± = c\/c 3 for now. By Minkowski inequality, 
W + ds\\i- W\\l > -\\ds\\l, and so 

\\Xdf 2 < 2ci(2 + l/r)v^||ds||i - (2c 1 /r) v ^||^||i- (2.7) 

First of all, since the left hand side of (2.7) is nonnegative, it follows that 

||^c||i < (l + 2r)||d s ||i. (2.8) 

On the other hand, by Xd = Xd$ + Xvs c , 

\\Xd\\ 2 2 = \\Xds\\ 2 2 + \\Xvs4 2 2 + 2{Xds,Xv S c) > \\Xds\\ 2 2 -2\(Xd s ,Xv S c)\. 

We next derive a lower bound of ||-X"d|||. First, by Xds = SjgS <kVi, 

\\xd s \\ 2 2 = j2 d "m\\ 2 2+ E MiQfVj) 

>E^H^Ili- E l^'H^I- 

i£S i,j&S,i^j 

Because || "Villi ^ a x and for i / j, \V^V 3 \ < pxWKhWVjh < bx^xn, we get 
\\Xd s \\l> axn^d 2 - bxfJ-xn E I^4jI 

= (ax + bxfjtx)n\\ds\\l - bxfJ-xn\\ds\\i- 
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Second, by Xv s - = J2j£s v j V j> 
\(Xd s ,Xv S c}\ = 



E divjvyvj 

ies, j^s 



< E i^iii^-i 

ieS, jgs 

<bx^xn E \diVj\ = bxlixn\\ds\\i\\vsc\\i. 

ieS, j£S 



Therefore, putting the above inequalities together, 

\\Xdf 2 > {a x + b x ^x)n\\ds\\l - b x ^xn\\d s \\l - 26^xn||ds||i||«sc|| 1 . (2.9) 

Combining (2.7) and (2.9), and then grouping the terms, we get 

(a x + bxvx)n\\d s \\ 2 2 < b x nxn\\d s \$ + 2ci(2 + l/T)y/n\\d s \\i 

+ 2{6 x /xxv^Ns||i-ci/r}v^||^||i- (2-10) 

So far, other than the requirement that S D spt(/3), the choice of S is arbitrary. 
To continue, we need the next result that puts more constraints on S. 

Lemma 2.3 Suppose S D spt(/3) such that ax + bxfJ-x > bx/J-x(3 + 4r)|5'|. Then 
bxfJ-xVn\\ds\\i < c~i/t. 

Assume the lemma is true for now. Let S D spt(/3) such that ax + bxfJ-x > 
bx/J-x(3 + 4t)|S'|. Later we will see that such 5 indeed exists and make specific 
choices for it. By (2.10), Lemma 2.3, and Cauchy-Schwartz inequality, 

(a x + b X fix)n\\d s \\ 2 2 < b X fixn\\d s \\i + 2ci(2 + l/r)v^||ds||i 



< bxHxn\S\\\d s \\l + 2ci(2 + l/T)yfi\S\\\dsh 



< (a x + b X fix)n\\ds\\ 2 2 /3 + 2c 1 (2 + 1/t)^\S\ 
where the last inequality is due to ax + bx^x > 3b x /j, x \S\. Thus 



< 3ci(2 + l/r)V|5| ^ 
(ax + bxHx)Vn ' 

Let Si be the union of spt(/3) and the set of i spt(/3) with the |spt(/3)| largest 
di outside spt(/3). By Lemma 3.1 of [3], 

Mli<IM*li + ^£{i. 

Since d spt ^y = v spt ^c, by (2.8) and Cauchy-Schwartz inequality, 

\\d spm 4i < (1 + 2T)\\d spm \\i < (1 + 2r)v / |spt(/?)|||4 pt( ^ ) || 2 , 
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which together with (2.12) yields 

\\d\\ 2 2<\\d Sl \\ 2 2 + (l + 2T) 2 \\d spm \\l (2.13) 

Note | Si | = 2|spt(/?)|. By the assumption in (2.2) and Lemma 2.3, it is seen 
that (2.11) holds for S = S\ and for S = spt(/3). Combine this with (2.13) to get 



,,,,, , 3c 1 (2 + l/T) v / [2 + (l + 2r)2]|spt(/?)| 

where we have recovered c\ = c\/cz- The proof of (2.6) is then complete. □ 

Proof of Lemma 2.3. Assume the opposite were true, i.e. bx fJ-x Vn\\ds\\i > c\/t. 
Then clearly ds ^ 0. By (2.8), the right hand side of (2.10) is no greater than 

2ci(2 + l/T)Vn||ds||i + 2 {bxHx\/n\\ds\\i - h/r) y^(l + 2r)||d<?||i 

= 2b x ^xn(l + 2r)\\ds\\l 

so (2.10) together with Cauchy-Schwartz inequality yields (ax + fcxT^xOHdslli — 
6 x /ix(3 + 4r)||d s ||5 < 6 X/ u x (3 + AT)\S\\\d s \\l Since d s + 0, then a x + b x ^x < 
bx/J-x(3 + 4r)|S|, which contradicts the assumption. □ 



3 MLE for exponential linear models and LSE for ana- 
lytic models 

In [4], by choosing suitable search domain D, we derived error bounds for the 
^-regularized MLE and LSE for exponential linear models and analytic models, 
respectively. Under the conditions in Proposition 2.1, similar error bounds can be 
derived for the ^-regularized MLE and LSE, by following almost verbatim the steps 
in [4]. For brevity, we shall omit the detail. Instead, we shall focus on how to get 
error bounds under the conditions in Proposition 2.2. 



3.1 Exponential linear models 



Let {p(x; t) : t E J} be a family of probability densities with respect to a nonzero 
Borel measure fi on M, where / C R is a closed interval, such that 



p(x; t) = exp {ty - A(t)} , with A(i) = In 



e ty Kdy) 



t G /. 



Suppose y\,...,y n are independent, each with density p(x; Xj (3). Let D = 
D(I), where D(I) is defined in (2.4). Assume (3 6 D, i.e. Xj (3 £ I for each i. The 
^-regularized MLE for (3 is 



(3 = arg max 



' XV -^K{Xj V) - CrWvWx 



1=1 
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Let €i = yi - E( yi ) = Vi - A'(Xj(3), G{x) = £? =1 a*, M*) = A W " M{X?p)z, 
and ipi(z) = z/2. Then it can be been that /3 satisfies the inequality (2.1). 

Following almost verbatim the proof of Lemma 6.1 in [4], if e satisfies the tail 
condition (2.3), then Condition HI is satisfied by setting cq = c e and 



Mp/g) 

2n i<T<p ' 



ci = <7\l — .max ||Vj-||2. 



On the other hand, in [4], it was actually also shown that for each v G f(-f), 
G(il>(Xv) - 4>{X{3)) > (l/2)inf ie/ A"(t) x ||JT(u - /3)|||. As a result, we can set 

c 3 = (l/2)infA"(t). 

te/ 

If inff g / A"(t) > 0, then, provided (2.2) in Proposition 2.2 is satisfied, 
3(2 + l/r)V2 + (l + 2r) 2 



Pr - (3\\ 2 < V ^ ' x V21n(p/g) 

cr-v/|sptp)| maxi< J<p ||Vj-|| 2 } 
X ninf te ,A"(*) 1* 1 -** 

In particular, for the logistic model, where A(i) = ln(l + e*), since e, = yi — h 1 {Xj (3) 
with yj = or 1, we can set a = 1/2 by Hoeffding's inequality [7]. Furthermore, by 
A"(t) = (2cosh(t/2))" 2 , mf ter A"(i) > for bounded I. 



3.2 Analytic models 

Suppose y = f(X T (3) + e, where e = (ei, . . . , e n ) T has mean and / is defined on 
a closed interval Id with positive length. Also, suppose / can be continuously 
extended into an analytic function on an open domain NcC that contains /. Now 
let D C and assume (3 € D. The ^-regularized LSE estimator for (3 is 

P = argmin [\\y - f(Xv)\\% + Cr\\v\\i] . 

If we set G(x) = ||a;||| and ipi(z) = <fi{ z ) = f( z )i then it can be seen that (3 satisfies 
(2.1), and for v £ I, G(i/>{Xv) - t/>(Xp)) > d(/, I) 2 \\X{v - /?)||| [!], where 

d(/,J) = inf/ JM^M : x€ J, y € J, x^yj. 

Therefore, if d(/, J) > 0, then we can set C3 = d(/, I) 2 . 

In order to apply Proposition 2.2, we also need to get ci for Condition HI. We 
consider two cases. 
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In the first case, D = D(I) n {v G M. p : ||i>||ioo < 9g/2} an d is compact, where 
9 G (0, 1), g > such that {z G C : \z\ < g} C N, and 

p 

||w||i,oo = ^l u illl y ill°°- 

Let o" be as in the tail condition (2.3). Given q G (0, 1), let X p = ln[j>(l + g -1 )]. As 
stated in Proposition 6.5 in [4], we can set 

oo 

Cl = (Tyf2\p 22 
k=l 

Then by Proposition 2.2, we get an error bound of the same order as the £o- 
regularized estimator in [I]. Note that the constraints on D include a bound on 
the weighted £i-norm ||w||i i0 o but no limits on |spt(u)|. As a result, the LSE is 
purely regularized by £i-norms \\v \\\ and \\v [| i )0 o - 

Second, D = D(I) and is compact, but not necessarily contained in a disc on 
which / is analytic. Again, the LSE is purely regularized by £i-norms of v. However, 
it becomes harder to set c\. A relatively simple choice of c\ is as follows. Let g > 0, 
such that for any x G /, {z & C : \z — x\ < g} C INT. Let = sup xeI \f^ k \x)\/k\, 
and 5(D) be the infimum of the radii of spheres under || • ||i j00 that contain D, i.e., 

5(D) = inf{a > : there is u G M. p such that \\v — it||i i0 o < a, for all v G D}. 

Then, given g\ G (0, g), we can set 

oo 
k=l 

where Q = 4:5(D)/g\ + 1. This value of c\ results from Proposition 5.5 (2) in [4] by 
noting the trivial bound |spt(-u)| < p, which is nevertheless the tightest we can get, 
as no explicit constraints on |spt(u)| are available. 

Unfortunately, if we use (3.1) to set c\, then, in order for the error bound in 
Proposition 2.2 to be at most of order o(l), p cannot be very large. Indeed, as the 
error bound is proportional to c\y / \spt(P)\/n > cy / |spt(/?)|plnp/n for some c > 0, 
p has to be of order o(n/ Inn). 

3.3 Regression with noise-corrupted underlying linear structure 

It is possible to generalize the treatment for analytic models to the following one 

y = f(X?p + t) + e (3.2) 

where £i, • • • , £n> e l> • • • > e n are independent with mean 0, and £j's are identically 
distributed. The model reflects the point of view that noise can appear anywhere. 



k 



2pln(pQ) + kX p d k g k l 1 



x n 2k max ||K'||2fc 
i<i<p 



(3.1) 
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For nonlinear /, in general, if the common distribution of £j's is unknown, then 
E(yi) are unknown and regression becomes impossible. If, on the other hand, the 
distribution is known, then E[f(z + are known. Apprently, they are identical. 
Denote g{z) = E[f{z + &)] and let $ = f{Xj (3 + &) - g{Xj /?) + Ci . Then 

y = g(X T (3) + 5. (3.3) 

Note that, in general, the distributions of Si depend X^f3. Since the latter are 
not identical, Si,...,S n are not identically distributed. Furthermore, since (3 is 
unknown, in general, even if the distributions of are known, the distributions of 
Si are still unknown. Despite this, by only using the fact that 5, are independent, 
each with mean 0, it is possible to apply the results in previous sections to (3.3), 
hence getting error bounds of estimation for (3.2). 

To make this work, we need to check a few conditions, such as the analyticity of 
g(z) and the tail condition (2.3) for S. We next present a case where the necessary 
conditions are satisfied. 

Suppose we set D = 1>(I) with / = [— R,R]. Suppose & are bounded random 
variables with < r and there is i?o > R + r, such that / is continuous on 
Ao := {z G C : \z\ < Rq} and analytic within it. Let A = {z G C : \z\ < Rq — r}. 
For each z G A, by z + G A , + < sup Ao |/| < oo, so g{z) = E[/(z + &)] 
is well defined. Clearly, / is contained within A. 

Proposition 3.1 (1) g(z) is analytic on A and d(g,I) > d(/, [— Rq,Rq\). 

(2) If ei, . . . , e n satisfy (2.3) for some a > and c e > 0, then 5±, . . . ,S n satisfy 
(2.3) as well for possibly different values of a and c t . Moreover, if ei are bounded, 
then c e can always be set at 2. 

Thus, the results on ^-regularized LSE in previous sections can be applied to 
(3.3). We omit the detail and will only prove the Proposition. 

Proof. (1) Given z G A, for every possible value of £i, we have f(z + £i) = 
J2h=of {ii)z k /k\. By Cauchy's contour integral, 

\f {k \ti)\ i f [/(OK < ^qsu PAo |/| 

k\ - 2tt J Kl=Ro ic - eii fc+1 - (Ro - r) fe+i 

Because Rq — r > \z\, 

V E l/ (fc) fe)l , , fc < Rosup Ao \f\ / \z\ \ k 

k\ 11 - R -r 2^{ R(j _ r ) <0 °- 

k=0 u fc=0 v u 7 

Then by dominated convergence, it is seen that g(z) = YH^Lo with 
the power series being convergent on A. Therefore g(z) is analytic on A. 

To get d(<7, 1) > d(/, [—Rq,Rq]), let the right hand side be positive. Then / is 
monotone on [—Rq,Rq], say, increasing. Then g(z) = E[f(z + £i)] is increasing on 
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I and for x < y, g(y) - g{x) = E[f(y + £1) - f(x + £)] > d(/, [-i? , i*o])(y - z), 
finishing the proof of (1). 

(2) Let m = f(Xj(3 + &) - Then ess sup r/; - essinf 77; < 2sup A[) |/| 

and Si = r]i + ej. Given t > and a € M n , 

Pr{|a T 5| > t\\a\\ 2 } < Pr{|a T ?7| > t||a|| 2 /2} + Pr{|a T e| > t||a|| 2 /2} 

£2exp {"8ii£j7F} + c < exp {^}' 

where the last inequality is due to Hoeffding's inequality and the tail condition (2.3). 
This implies the first claim of (2). If q are bounded, then Si are bounded, and the 
second claim follows from Hoeffding's inequality. □ 
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